better answer
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Europe > Switzerland (0.04)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)
- Information Technology (0.67)
- Education (0.46)
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Banking & Finance (1.00)
- (2 more...)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Europe > Switzerland (0.04)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)
- Information Technology (0.67)
- Education (0.46)
Explicit v.s. Implicit Memory: Exploring Multi-hop Complex Reasoning Over Personalized Information
Zhang, Zeyu, Zhang, Yang, Tan, Haoran, Li, Rui, Chen, Xu
In large language model-based agents, memory serves as a critical capability for achieving personalization by storing and utilizing users' information. Although some previous studies have adopted memory to implement user personalization, they typically focus on preference alignment and simple question-answering. However, in the real world, complex tasks often require multi-hop reasoning on a large amount of user information, which poses significant challenges for current memory approaches. To address this limitation, we propose the multi-hop personalized reasoning task to explore how different memory mechanisms perform in multi-hop reasoning over personalized information. We explicitly define this task and construct a dataset along with a unified evaluation framework. Then, we implement various explicit and implicit memory methods and conduct comprehensive experiments. We evaluate their performance on this task from multiple perspectives and analyze their strengths and weaknesses. Besides, we explore hybrid approaches that combine both paradigms and propose the HybridMem method to address their limitations. We demonstrate the effectiveness of our proposed model through extensive experiments. To benefit the research community, we release this project at https://github.com/nuster1128/MPR.
- North America > United States > District of Columbia > Washington (0.05)
- Asia > China (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
GRP: Goal-Reversed Prompting for Zero-Shot Evaluation with LLMs
Song, Mingyang, Zheng, Mao, Luo, Xuan
Using Large Language Models (LLMs) to evaluate and compare two answers from different models typically involves having LLM-based judges select the better answer. However, humans often approach problem-solving from a reverse perspective, for instance, by choosing the worse option instead of the better one in a pairwise comparison. Generally, this kind of reverse thinking plays a crucial role in human reasoning and decision-making and can further test the difference between original and reverse thought processes simultaneously. To address the above issue, in this paper, we propose a Goal-Reversed Prompting (GRP) approach for pairwise evaluation that shifts the original task from selecting the better answer to choosing the worse one. We encourage LLMs to think in reverse by prompting LLMs to identify the worse response. Experiments on closed-source models demonstrate that GRP significantly enhances evaluation capabilities, outperforming the prompt template with the original goal.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Spain (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Right this way: Can VLMs Guide Us to See More to Answer Questions?
Liu, Li, Yang, Diji, Zhong, Sijia, Tholeti, Kalyana Suma Sree, Ding, Lei, Zhang, Yi, Gilpin, Leilani H.
In question-answering scenarios, humans can assess whether the available information is sufficient and seek additional information if necessary, rather than providing a forced answer. In contrast, Vision Language Models (VLMs) typically generate direct, one-shot responses without evaluating the sufficiency of the information. To investigate this gap, we identify a critical and challenging task in the Visual Question Answering (VQA) scenario: can VLMs indicate how to adjust an image when the visual information is insufficient to answer a question? This capability is especially valuable for assisting visually impaired individuals who often need guidance to capture images correctly. To evaluate this capability of current VLMs, we introduce a human-labeled dataset as a benchmark for this task. Additionally, we present an automated framework that generates synthetic training data by simulating ``where to know'' scenarios. Our empirical results show significant performance improvements in mainstream VLMs when fine-tuned with this synthetic data. This study demonstrates the potential to narrow the gap between information assessment and acquisition in VLMs, bringing their performance closer to humans.
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Europe > Switzerland (0.04)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.55)
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
Ye, Ziyi, Li, Xiangsheng, Li, Qiuchi, Ai, Qingyao, Zhou, Yujia, Shen, Wei, Yan, Dong, Liu, Yiqun
Learning from preference feedback is a common practice for aligning large language models~(LLMs) with human value. Conventionally, preference data is learned and encoded into a scalar reward model that connects a value head with an LLM to produce a scalar score as preference or reward. However, scalar models lack interpretability and are known to be susceptible to biases in datasets. This paper investigates leveraging the generation capability of LLMs to address both limitations in one shot. Specifically, we prompt the pre-trained LLM to generate positive and negative judgments, both supported with rationales in natural language form. The self-generated contrastive judgment pairs are used to train the generative judge with Direct Preference Optimization (DPO). This proposal of training the generative Judge using self-generated Contrastive judgments (Con-J) ensures natural interpretability due to the generated rationales together with the judgments, as well as high robustness against bias without the need for an additional reward head. Experimental results show that the performance of Con-J is comparable to the scalar reward model trained on the same collection of preference data, and demonstrate its superior interpretability and robustness in encoding human preferences.
ChatGPT vs. Google Bard: Which gives the better answers?
Generative AI models are the hot new thing in the Big Tech world, and everyone is joining the race. The buzz really only started with OpenAI's ChatGPT chatbot, a generative AI language model that is incredibly good at predicting which words should follow one another when you feed it with prompts. Google has long been working on a similar technology, dubbed LaMDA, and with ChatGPT taking the world by storm, the company saw itself forced to release some version of its AI model to the world. That's how we got Bard, Google's first publicly available chat-based generative language model, with access to many parts of the internet. But is Google really at the same level as ChatGPT already?
- North America > United States > New York (0.06)
- Europe > Sweden > Skåne County > Malmö (0.05)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.80)
DeepMind's new chatbot uses Google searches plus humans to give better answers
The difference between this approach and its predecessors is that DeepMind hopes to use "dialogue in the long term for safety," says Geoffrey Irving, a safety researcher at DeepMind. "That means we don't expect that the problems that we face in these models--either misinformation or stereotypes or whatever--are obvious at first glance, and we want to talk through them in detail. And that means between machines and humans as well," he says. DeepMind's idea of using human preferences to optimize how an AI model learns is not new, says Sara Hooker, who leads Cohere for AI, a nonprofit AI research lab. "But the improvements are convincing and show clear benefits to human-guided optimization of dialogue agents in a large-language-model setting," says Hooker. Douwe Kiela, a researcher at AI startup Hugging Face, says Sparrow is "a nice next step that follows a general trend in AI, where we are more seriously trying to improve the safety aspects of large-language-model deployments."
How To Ace ML Interview Questions
Suppose you get a call from the recruiter of your dream company where you have applied for the ML Engineer role. You have set a date and started preparation with an ML study guide like this one or similar. On the day of the interview, you are able to answer all the questions and are confident that you will move onto the onsite stage. However, you get a call from the recruiter saying that they have decided not to go forward. It is not enough to answer the question, because the interviewer wants to see that you have a deep understanding of the topic/question.